563 research outputs found
Cross-Lingual Adaptation using Structural Correspondence Learning
Cross-lingual adaptation, a special case of domain adaptation, refers to the
transfer of classification knowledge between two languages. In this article we
describe an extension of Structural Correspondence Learning (SCL), a recently
proposed algorithm for domain adaptation, for cross-lingual adaptation. The
proposed method uses unlabeled documents from both languages, along with a word
translation oracle, to induce cross-lingual feature correspondences. From these
correspondences a cross-lingual representation is created that enables the
transfer of classification knowledge from the source to the target language.
The main advantages of this approach over other approaches are its resource
efficiency and task specificity.
We conduct experiments in the area of cross-language topic and sentiment
classification involving English as source language and German, French, and
Japanese as target languages. The results show a significant improvement of the
proposed method over a machine translation baseline, reducing the relative
error due to cross-lingual adaptation by an average of 30% (topic
classification) and 59% (sentiment classification). We further report on
empirical analyses that reveal insights into the use of unlabeled data, the
sensitivity with respect to important hyperparameters, and the nature of the
induced cross-lingual correspondences
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Reasoning is a crucial part of natural language argumentation. To comprehend
an argument, one must analyze its warrant, which explains why its claim follows
from its premises. As arguments are highly contextualized, warrants are usually
presupposed and left implicit. Thus, the comprehension does not only require
language understanding and logic skills, but also depends on common sense. In
this paper we develop a methodology for reconstructing warrants systematically.
We operationalize it in a scalable crowdsourcing process, resulting in a freely
licensed dataset with warrants for 2k authentic arguments from news comments.
On this basis, we present a new challenging task, the argument reasoning
comprehension task. Given an argument with a claim and a premise, the goal is
to choose the correct implicit warrant from two options. Both warrants are
plausible and lexically close, but lead to contradicting claims. A solution to
this task will define a substantial step towards automatic warrant
reconstruction. However, experiments with several neural attention and language
models reveal that current approaches do not suffice.Comment: Accepted as NAACL 2018 Long Paper; see details on the front pag
TIR 2015 Workshop Preface
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record
Recommended from our members
Demanded Abstract Interpretation
Formal static analysis is seeing increasingly widespread adoption as a tool for verificationand bug-finding, but even with powerful cloud infrastructure it can take minutes or hours for a
developer to get analysis results after a code change. This dissertation considers the problem of
making expressive and sophisticated static analyzers interactive by providing analysis results to
developers in as close to real time as possible. While existing techniques offer some demand-driven
or incremental aspects for certain classes of analysis, the fundamental challenge addressed by this
work is doing both for abstract interpretation in arbitrary domains.This dissertation presents a technique, demanded abstract interpretation, that lifts analysiscomputations to a dependency graph structure in which incremental program edits and demand-driven evaluation of abstract semantics can be handled uniformly. Demanded abstract interpretation
draws inspiration from graph-based approaches to incremental computation, and is not only sound
and terminating but also from-scratch consistent with underlying batch analyses.
The approach is parametric in the choice of abstract domain, supporting a wide range of
analysis problems and enabling the reuse of highly-tuned existing domain implementations in our
demanded analysis framework without requiring any per-domain reasoning about incrementality or
demand. The complex, cyclic, and unbounded dependency structures that arise when analyzing
loops and recursive control flow in an infinite-height domain are a key challenge, which our approach
handles by dynamically extending novel acyclic encodings of such analysis computation.This dissertation describes and formalizes demanded abstract interpretation techniques forboth intraprocedural analysis and compositional interprocedural analysis. We also present promising
experimental results in a prototype analysis implementation, and describe some extensions to the
framework designed to confront practical resource constraints without sacrificing formal guarantees
Retrieval Models for Genre Classification
Genre provides a characterization of a document with respect to its form or functional trait. Genre is orthogonal to topic, rendering genre information a powerful filter technology for information seekers in digital libraries. However, an efficient means for genre classification is an open and controversially discussed issue. This paper gives an overview and presents new results related to automatic genre classification of text documents. We present a comprehensive survey which contrasts the genre retrieval models that have been developed for Web and non-Web corpora. With the concept of genre-specific core vocabularies the paper provides an original contribution related to computational aspects and classification performance of genre retrieval models: we show how such vocabularies are acquired automatically and introduce new concentration measures that quantify the vocabulary distribution in a sensible way. Based on these findings we construct lightweight genre retrieval models and evaluate their discriminative power and computational efficiency. The presented concepts go beyond the existing utilization of vocabulary-centered, genre-revealing features and open new possibilities for the construction of genre classifiers that operate in real-time
A keyquery-based classification system for CORE
We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements
- …